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Let Xi, . . . , X n be i.i.d. observations, where Xi = Yi+a n Zi and the 
Y's and Z's are independent. Assume that the Y's are unobservable 
and that they have the density / and also that the Z's have a known 
density k. Furthermore, let a n depend on n and let a n — > as n — > oo. 
We consider the deconvolution problem, i.e. the problem of estimation 
of the density / based on the sample X\, . . . , X n . A popular estimator 
of / in this setting is the deconvolution kernel density estimator. We 
derive its asymptotic normality under two different assumptions on the 
relation between the sequence a n and the sequence of bandwidths h n . 
We also consider several simulation examples which illustrate different 
types of asymptotics corresponding to the derived theoretical results 
and which show that there exist situations where models with a n — > 
have to be preferred to the models with fixed a. 

Keywords: Asymptotic normality, deconvolution, Fourier inversion, 
kernel type density estimator. 
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1 Introduction 

The classical deconvolution problem consists of estimation of the density / of 
a random variable Y based on the i.i.d. copies Yi, . . . , Y n of Y, which are cor- 
rupted by an additive measurement error. More precisely, let X\, . . . , X n be 
i.i.d. observations, where Xi = Yi + Zi and the Y's and Z's are independent. 
Assume that the Y's are unobservable and that they have the density / and 
also that the Z's have a known density k. Such a model of measurements 
contaminated by an additive measurement error has numerou s applications 



i n pra ctice and arises in a variety of fields, see for instance ICarroll et al 



(2006). Notice that the X's have a density g which is equal to the 



convo- 



lution of / and k. The deconvolution problem consists in estimation of the 
density / based on the sample Xi,..., X n . 

A popular estimator of / is the deconvolutio n ker nel density estimator, 



which was proposed in ICarroll an d Hall (1988) and IStefanski and Carrolll 



(|1990l ). see also pp. 231-233 in IWassermanl (120071) for an int r oduc tion. Ad 



ditional recent references can be found e.g. in Ivan Es et al.1 ( 20081 ). Let 
be a kernel and h n > a bandwidth. The deconvolution kernel density 
estimator f n h n is constructed as 

f M - 1 f° r -itx Mhnt)<t>emp(t) _lAl ( x - X A 



where emp denotes the empirical characteristic function, i.e. 4> e mp{t) 

=1 exp(itXj). 
k, respectively, and 



n x Yll=i Gxp(itXj), 4> w and <pk are Fourier transforms of functions w and 



2vr y_oo 4>k{t/h n ) 

Depending on the rate of decay of the characteristic function <f>k at plus and 
minus infinity, deconvolution problems are usually divided into two groups, 
ordinary smooth deconvolution problems and supersmooth deconvolution 
problems. In the first case it is assumed that (j)^ decays to zero at plus and 
minus infinity algebraically (an example of such k is the Laplace density) 
and in the second case the decay is essentially exponential (in this case k can 
be e.g. a standard normal density). In general, the faster (j)^ decays at plus 
and minus infinity (and consequently smoother the den s ity k is ), the more 
difficult the deconvolution problem becomes, see e.g. lFanl ( 1991al ). The usual 



smoothness condition imposed on the target density / is that it belongs to 
the class C a , L = {/ : \f w (x) - f^(x + 1)\ < L\t\ a ~ e for all x and t}, where 
a > 0, I = [a\ (the integer part of a) and L > are known constants, cf. 
Fanl ( 1991al ). Then, if k is ordinary smooth of order /? (see e.g. Assumption 



C (ii) below for a definition), the optimal rate of convergence for the esti- 
mator f n h n (x) with the mean square error used as the performance criterion 



is n 



-a/(2a+2/3+l) 



while if k is supersmooth of order A ( see Assump tion B 



Fan! (1991aj)- The 



(ii)), the optimal rate of convergence is (logn) _a ' A , see 
latter convergence rate is rather slow and it suggests that the deconvolution 
problem is not practically feasible in the supersmooth case, since it seems 
samples of very large size are required to obtain reasonable estimates. Hence 
at first sight it appears that the nonparametric deconvolution with e.g. the 
Gaussian error distribution (a popular choice in practice) cannot lead to 
meaningful results for moderate sample sizes and is practically irrelevant. 
However, it was de monstra t ed by exact MISE (mean integrated square er- 
ror) computations in lWandl ( 19981 ) that, despite the slow convergence rate in 
the supersmooth case, the deconvolution kernel density estimator performs 
well for reasonable sample sizes, if the noise leve l meas u red b y the noise- 
to-signal ratio NSR = Var[Z](Var[y])- 1 100%, cf. IWandl j 19981 ). is not too 
high. Clearly, an 'ideal case' in a deconvolution problem would be that not 
only the sample size n is large, but also that the error term variance is small. 
This leads one to an idealised model X = Y + a n Z, where now Var[Z] = 1 
and a n depends on n and tends to z e ro as n — ► oo. The idea to consider 

was al ready proposed in iFanl ()1992l ) and was further developed in 

Delaigld ( 20081 ). We refer to these works for additional motivation. These 
papers deal mainly with the mean integrated square error of the estimator of 
/. Here we will study its asymptotic normality. Asymptotic normality of the 
deconvolution kernel density estimator in the deconv olutio n problem with 
fixed e rror t erm variance was derived in IFanl ( 1991b! ) and Ivan Es and Uh 
( 2004 . 120051 ). For a practic al situation where a n — » can arise, see e.g. 
Section 4.2 of lDelaigld ( 20081 ). where an example of measurement of sucrase 
in intestinal tissues is considered and inference is drawn on the density of 
the sucrase content. Sucrase is a name of several enzymes that catalyse the 
hydrolisis of sucrose to fructose and glucose. 

It trivially follows from §T§ that the deconvolution kernel density esti- 
mator for the model that we consider, i.e. Xi = Yi + cr n Zi with a n — ► as 
n — ► oo, is defined as 



cr, 



Jnh n \ x ) — n 



-itx 



.<t> w (h n t)</) e mp(t) 



4>k{&nt) 



-dt 



1^ 1 



n * — ' h, 



X, 



h„ 



, (2) 



where 



w r 



^ = h 



a -^^4^-dt, 



(3) 



4>k{r n t) 

T n = &ri/h n and 4>k now denotes the characteristic function of the random 
variable Z with a density k. We will also use p n = r~ l = h n /a n and in this 
case we will denote the function w Tn by w Pn . Observe that if w is symmetric, 
([2]) will be real- valued. 

To get a consistent estimator, we need to control the bandwidth h n . The 
usual condition to get consistency in kernel density estimation is that the 



bandwidth h n d epends on n and i s such that h n — ► Q,nh n — ► oo, see e.g. 



Theorem 6.27 in IWassermanl (J2007I ). Since in our model we assume a n — ► 0, 
additional assumptions on h n , which relate it to o~ n , are needed. In essence 
we distinguish two cases: cr n /h n — ► r with < r < oo, or a n /h n — > oo. 
Conditions on the target density /, the density k of Z and kernel to will be 
tailored to these two cases. 

The remaining part of the paper is organised as follows: in Section [2] 
we will present the obtained results. Section [3] contains several simulation 
examples illustrating the results from Section [21 All the proofs are given in 
Section HI 

2 Results 

2.1 The case < r < oo 

We first consider the case when < r < oo. We will need the following 
conditions on /, w, k and h n . 

Assumption A. 

(i) The density / is such that 4>f is integrable. 
(ii) (f>k(t) t^ for all t & M and <f>k has a bounded derivative, 
(iii) The kernel w is symmetric, bounded and continuous. Furthermore, 
4> w has support [—1,1], <f> w (Q) = 1, <fi w is differentiable and |</>™(i)| < 1. 
(iv) The bandwidth h n depends on n and we have h n — ► 0, nh n — * oo. 
(v) a n — ► and r n = a n /h n — > r, where < r < oo. 

Notice that Assumption A (i) implies that / is continuous and bounded. 
Assumption (j>k(t) ^ for all t £ M is standard in kernel deconvolution and 
is unavoidable when using the Fourier inversion approach to deconvolution. 
Furth ermore, a variety of ke rnels satisfy Assumption A (iii), see e.g. exam- 



ples in Ivan Es and Uhl (120051 ). Also notice that w is not necessarily a density, 
since it may take on negative values. Observe that in Assumption A (v) we 
do not exclude the case r = 0. 

The following theorem establishes asymptotic normality in this case. 

Theorem 1. Let Assumption A hold and let the estimator f n h n be defined 
by ©. Then 

Vn~K(f nhn (x) - E [f nhn (x)}) ^Af(o, f(x)J°° \w r (u)\ 2 du) (4) 



as n — > oo. 



Notice that unlike the asymptotic normality theorem for the deconvo- 
lution kernel density estimator in the supersmooth deconvolution problem 



with fixed a, that was obtained in Ivan Es and Uhl ( 2004 . 120051 ). the asymp- 



totic variance in (J3J) now depends on /. When r n = for all n, we recover 
the asymptotic n ormality theorem for an ordinary kernel density estimator, 



sec 



Parzenl (| 19621 ). 



2.2 The case r = oo 

We turn to the case r = oo. In this case we have to make the distinction 
between the ordinary smooth and supersmooth deconvolution problems. We 
first consider the supersmooth case. We will need the following condition. 

Assumption B. 

(i) The density / is such that <j>t is integrable. 

(ii) 4> k (t) / for all t € M and (f> k (t) ~ C|t| A ° exp(— |t| A //x) for some 
constants A > 1,/i > and real constants Ao and C. 

(iii) w is a bounded, symmetric and continuous function. Furthermore, 
4> w is supported on [—1,1], 4> w (0) = 1 and |<Aw(£)| < 1- Moreover, 

cj) w (l - t) = At a + o(t a ) 

as 1 1 0, where A £ K and a > are some numbers. 

(iv) The bandwidth h n depends on n and we have h n — ► 0, nh n — > oo. 
(v) a n — ► and a^/h^~ l —> oo. 



Assumption B (i)-(iv) correspond to those in Ivan Es and Uhl ( 20051 ) . As- 



sumption B (v) is stronger than a n /h n — ► oo, but it is essential in the proof 
of Theorem [21 Denote (,{p n ) = ex P(l/(A t Pn))- The following theorem holds 
true. 

Theorem 2. Let Assumption B hold and let the estimator f n h n be defined 
by ([2]). Furthermore, assume that E [Y- 2 ] < oo and E [Z 2 ] < oo. Then 

VnTn ,, , , rrf , , n V Kr ( n A 2 //i\2+2a 



(f nhn (x)-E lf nhn (x)]) Z M (o, ^^ (^) "^ (T(a + l)) 2 



A(l+a)+Ao-l^/ .w™»v-v ^un«„ v -7jy ^ v ^' 2vr 2 C 2 V A, 

(5) 

as n — ► oo. 

When er n = 1 for all n, the arguments given in the proof of this theorem 
are stil l valid, and hen c e we c an also recover the asymptotic normality theo- 
rem of Ivan Es and Uhl ( 20051 ) for the deconvolution kernel density estimator 



in the supersmooth deconvolution problem. 

Finally, we consider the ordinary smooth case. 

Assumption C. 

(i) The density / is such that (f)f is integrable. 



(ii) <f> k (t) ± for all t £ R and 4> h (t)lP -» C, ^ fc (t)^ +1 -» -QC as i -» oo, 
where /3 > and C/0 are some constants. 

(iii) (fi w is symmetric and continuously differentiable. Furthermore, cj> w 
is supported on [—1,1], |0 w (t)| < 1 and (f) w (0) = 1. 

(iv) The bandwidth /i n depends on n and we have /i n — ► 0, n/i n — > oo. 

(v) a n — ► and o n jh n — > oo. 



For the discussion on Assumption C (i)-(iv) see iFanl ( 1991b! ) 



Theorem 3. Lei Assumption C hold and let the estimator f n h n be defined 
by ©. Then 

\Jnh n pf(f nhn (x) - E [f nhn (x)]) Z N (o, iM |" | f |2^|^ (i) |2^ (6) 



as n ^ oo. 



When a n = 1, we recover the asymptotic normality theorem of I Fan 



( 1991bJ) for a deconvolution kernel density estimator in the ordinary smooth 



deconvolution problem. 

As a general conclusion, we notice that Theorems [THH] demonstrate that 
the asymptotics of f n h„(x) depend in an essential way on the relationship 
between the sequences o~ n and h n . In case r n — ► r < oo, the asymptotics are 
similar to those in the direct density estimation, while when r = oo, they 
resemble those in the classical deconvolution problem. 

3 Simulation examples 

In this section we consider several simulation examples for the supersmooth 
deconvolution case covered by Theorems [T] and EJ We do not pretend to 
produce an exhaustive simulation study. Our examples serve as a mere 
illustration of the asymptotic results from the previous section. 

It follows from Theorems HH3] that for a fixed point x and a large enough 
n, a suitably centred and normalised estimator f n h n (x) is approximately 
normally distributed with mean and standard deviation given in these three 
theorems. Suppose we have fixed the sample size n and the bandwidth 
h n , generated a sample of size n, evaluated the estimate f n h n ( x ) an d have 
repeated this procedure A^ times, where N is sufficiently large. This will 
give us N values of f n h n (x)- We then can evaluate the sample mean and the 
sample standard deviation of this set of values f n h n (%)- Under appropriate 
conditions these should be close to the ones predicted by Theorems [T] and 
[2j In particular, in the setting of Theorem [H the mean M and the standard 
deviation SD must be approximately given by 

1 f°° 

M = f*w h Jx), SD = -=f{x) \w an/hn (u)\ 2 du, (7) 

y/nh n J-oc 



while in the setting of Theorem [2] they are approximately equal to 



M = f* Whn (x), SD 



V2irC VA 



l/in + l)^ __UM. ( 8 ) 



nov, 



We first concentrate on Theorem [TJ Let / and A; be standard normal 
densities, let n = 1000 and suppose a n = 0.1. The noise level measured by 
the noise-to-signal ratio is thus rather low and equals NSR = 1%. Suppose 
that a kernel w is given by 



w(x) 



48 cos x 



TTX H 



i-i5 



144 sin x 



7TX° 



X A 



(9) 



Its corresponding Fourier transform is given by 4> w (t) = (1 — t ) lr_ 1,1] (*)- 
Here A = 8 and a = 3. A go od performance of t his ke rnel in deconvolu- 
tion context was established in iDelaigle and Hall I ( 20061 ) . Assume that the 
number of replications N = 500. Before we proceed any further, we need to 
fix the bandwidth. We opted for a theoretically optimal bandwidth, i.e. the 
bandwidth that minimises 



MISE[/, 



nh n \ 



E 



(fnh n ( x ) ~ f(x)) 2 d,X 



(10) 



the mean-squared error of the estimator f n } l . To find this optimal bandwidth, 
we considered a sequence of bandwidths h = 0.01 *k,k = 1,2, ... ,K, where 
K is a large enough i nteger, passed to the Fourier transforms in (1101 ) via 



Parseval's identity, cf. IWandl (|1998l ). and then used the numerical integra- 
tion. This procedure resulted in h n = 0.1. For real data the above method 
does not work , because (IIOD depends on the unknown /, and we refer to 
Delaiglej ( 20081 ) for data-dependent bandwidth selection methods. However, 
once again we stress the fact that in order to reach a specific goal of these 
simulation examples, the bandwidth h n must be the same for all iV replica- 
tions. This excludes the use of a data-dependent procedure. To speed up 
the compu t ation of th e estimates, binning of obs ervations was used, see e.g. 
Silverman! ( 19821 ) and I Jones and Lotwickl ( 1984 ) for related ideas in kernel 
density estimation. 

Under these assumptions we evaluated the sample means and standard 
deviations of f n h„( x ) for x from a grid on the interval [—3, 3] with mesh size 
A = 0.1. These then were plotted in Figure [1] together with the theoretical 
values from (J7]). We notice that the sample means match the theoretical 
values very well. This can be also explained by the fact that the bandwidth 
h n is quite small. The match between the sample standard deviations and 
the theoretical standard deviations is slightly less satisfactory. It also turns 
out that Theorem[2]is clearly not applicable in this case: an evaluation of the 
theoretical standard deviation SD in (jHJ) yields a very large value 3.41646, 
which grossly overestimates the sample standard deviation for any point x. 
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Figure 1: The sample means and the theoretical means (left display, a dotted and 
a solid line, respectively) together with the sample standard deviations and the two 
theoretical standard deviations corresponding to Theorems [T] and [5] (right display, 
a dotted, a solid and a dashed line, respectively). Here the target density / and the 
density k of a random variable Z are standard normal densities, the noise variance 
ofj = 0.01, the sample size n = 1000, the bandwidth h n — 0.1 and the kernel w 
is given by §§§. The number of replications equals N — 500. The integral in (fTTj) 
and not its asymptotic expansion was used to evaluate the standard deviation in 
Theorem [U 



The reason for this seems to be that both the sample size n and the error 
variance a\ appear to be too small for the setting of Theorem [2j 

At this point the following remark is in order. Reviewing the proof of 
Theorem [21 one sees that the following asymptotic equivalence is used: 



(j> w (s) exp[s x /(nh x )]ds ~ AT (a + 1) 



P 



P i/(^ A ) 



(11) 



as h — ► 0. This explains the shape of the normalising constant in Theorem 
[2] However, the direct numerical evaluation of the integral in (jlljl (with the 
same parameters and the kernel as in our example above) shows that the ap- 
proximation in (|11|) is good only for very small values of h and that it is quite 



i naccu rate for larger values of h, see a discussion in Ivan Es and Gugushvili 



(2008). Obviously, one can correct for the poor approximation of the sample 
standard deviation by the theoretical standard deviation by using the left- 
hand side of pip instead of its approximation. Nevertheless, this still leads 
to a very large (compared to the sample standard deviation) value of the 
theoretical standard deviation for our particular example, namely 0.034477. 

In our second example we left a n ,n and k the same as above, but as / 
we took a mixture of two normal densities with means — 1 and 1 and equal 
variance 0.375. The mixing probability was taken to be equal to 0.5. The 
density / is bimodal and is plotted in Figure [2j The simulation results for 
this density are reported in Figure O The conclusions are the same as for 
the first example. One can easily recognise a bimodal shape of the target 
density / by looking at the sample standard deviation. 

In our third example we again considered the standard normal density, 
but we increased the sample size to n = 10000. The results are reported in 




Figure 2: The density /: a mixture of two normal densities with means —1 and 1 
and equal variance 0.375. The mixing probability is taken to be equal to 0.5. 

Figure[U As can be seen, the match between the sample standard deviations 
and the theoretical standard deviations as computed using Theorem [1] is less 
satisfactory than in the previous example. The explanation lies in the fact 
that, even though the noise level is low when judged by itself, it is still a bit 
large compared to the sample size that we have in this case. Also Theorem 
[2] remains unapplicable, as it still produces considerably larger values of the 
theoretical standard deviation compared to the sample standard deviation 
(0.0166319 after the necessary correction using (fTTI) ). 

In the next three examples we kept the standard normal densities / and 
k, but increased the sample size n to 100000. The error variance a^ was con- 
secutively taken to be 0.01, 1 and 4, i.e. we considered three different noise 
levels, 1%, 100% and 400%. A transition from the asymptotics described by 
Theorem Q] to those described by Theorem [2] is clearly visible in the resulting 
plots, see Figures EH3 Figure \5\ also indicates that there exist intermediate 
situations not immediately covered by either of the two theorems. Notice 
that Figure [7] seems to confirm a general, albeit not intuitive message of The- 
orem [21 which says that the asymptotic standard deviation does not depend 
on a point x, but only on the error density k : there is a large neighbourhood 
around zero for which the sample standard deviation is almost constant. 

In our final example we considered the case when the density / is again 
a mixture of two normal densities (see above for details). The simulation re- 
sults for this density are reported in Figure [HJ In this last example the band- 
width h n = 0.44 was on purpose not selected as a minimiser of MISEf/^J, 
but was taken to be the same as when estimating a standard normal density 
(see Figure [7] above). Notice that the sample standard deviation is almost 
constant in the neighbourhood of the origin and is of the same magnitude 
as the one depicted in Figure [7J This seems to provide an additional con- 
firmation of the statement of Theorem [21 which says that the limit variance 
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Figure 3: The sample means and the theoretical means (left display, a dotted 
and a solid line, respectively) together with the sample standard deviations and 
the two theoretical standard deviations corresponding to Theorems [T] and (right 
display, a dotted, a solid and a dashed line, respectively). Here the target density 
/ is a mixture of two normal densities with means equal to — 1 and 1 and the same 
variance 0.375, the mixing probability is 0.5, the density A; of a random variable 
Z is a standard normal density, the noise variance <7„ — 0.01, the sample size 
n = 1000, the bandwidth h n — 0.08 and the kernel w is given by ^. The number of 
replications equals N = 500. The integral in (fTTjl and not its asymptotic expansion 
was used to evaluate the standard deviation in Theorem [2] 

of the estimator f n h n does not depend on the target density /. Also notice 
that because of the fact that h n is relatively large, the smoothed version of 
/, i.e. / * Wh n , is unimodal instead of being bimodal. 

As a preliminary conclusion (we also considered some other examples not 
reported here), our simulation examples seem to suggest that the asymp- 
totics given by Theorem [2] correspond to the less realistic scenarios of high 
noise level and very large sample size. This provides further motivation 
for the study of deconvolution problems under the assumption a n — > as 
n — > oo. 

4 Proofs 

To prove T heorem [Tl we will need the following modification of Bochner's 
lemma, see iParzenl ()1962r ) for the latter. 



Lemma 1. Suppose that for all y we have K n (y) — ► K(y) as n — ► oo and that 
sup„ \K n {y)\ < K*(y), where the function K* is such that j_ K*(y)dy < oo 
and liniy^oo yK * (y) = 0. Furthermore, suppose that g n is a sequence of 
densities, such that 



lim sup \g n (x 
n ^°°|«|< e „ 



u 



/Who (12) 

for some sequence e n j 0, such that e n /h n — ► oo as n — > oo for a sequence 
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Figure 4: The sample means and the theoretical means (left display, a dotted and 
a solid line, respectively) together with the sample standard deviations and the two 
theoretical standard deviations corresponding to Theorems [T] and [5] (right display, 
a dotted, a solid and a dashed line, respectively). Here the target density / and the 
density k of a random variable Z are standard normal densities, the noise variance 
ofj = 0.01, the sample size n = 10000, the bandwidth h n = 0.07 and the kernel w 
is given by §§§. The number of replications equals N — 500. The integral in (fTTj) 
and not its asymptotic expansion was used to evaluate the standard deviation in 
Theorem [U 



h n —> 0. Then 



lim -!- / K n { X -^- ) g n (y)dy = f{x) / K{y)dy. (13) 



Proof. The proof follows the same lines as the proof of Lemma 2.1 in I Fan 
(|l991bh . We have 



T- f^ Kn (^T 1 ) 9nWy ~ f{x) f^ K{y)dy 

"n J — oo V ^n / J — oo 



< 



i 11 A » V-ir) *(»)* - '<< /! *• f i "» 



/in / 

+ /(x 



"n J-oo 

fOO 



K n {y)dy 



K(y)dy 



I + 11. 



Notice that II converges to zero by the dominated convergence theorem. 
We turn to /. Splitting the integration region into the sets {\u\ < e n } and 
{\u\ > e n } for some e n > 0, we obtain that 



I< 



{|u|<e„} 



(g n (x - u) - f(x))—K n | — ) du 



+ 



/ (g n (x-u) -fix))—- K n ( — ) du 

J{\u\>e n } h n \tlnj 



III + IV. 
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Figure 5: The sample means and the theoretical means (left display, a dotted and 
a solid line, respectively) together with the sample standard deviations and the two 
theoretical standard deviations corresponding to Theorems [T] and [5] (right display, 
a dotted, a solid and a dashed line, respectively). Here the target density / and the 
density k of a random variable Z are standard normal densities, the noise variance 
ofj = 0.01, the sample size n — 100000, the bandwidth h n — 0.05 and the kernel w 
is given by §§§. The number of replications equals N — 500. The integral in (fTTj) 
and not its asymptotic expansion was used to evaluate the standard deviation in 
Theorem [U 



For III we have 



/oo 



(u)du. 



By (]12p the right-hand side of the above expression vanishes as n — > oo. 
Now we consider IV. Using the fact that g n is a density (and hence that it 
is positive and integrates to one), we have 



IV < 



u\>e„ 



g n (x-u) 



K* 



du + f(x)[ -L K *(f)du 

J\u\>e n ""n \il"n./ 



1 



<- sup \yK*(y)\+f(x) K*(y)dy. 

e \y\>e n /h n J\y\>e n /h n 



Notice that the right-hand side in the last inequality vanishes as n — > oo, 
because we assumed that e n /h n — ► oo. Combination of these results yields 
the statement of the lemma. □ 



Proof of The orem ffl The m ain steps of the proof are similar to those on pp. 
1069-1070 of lParzenl (1962J). Let 5 be an arbitrary positive number. Denote 



V nj = -f-Wr n 
fin. 



x ~ x j 

Hi, 



where w Tn is defined by ([3]) and notice that ([2]) is an average of the i.i.d. 
random variables V n i, . . . , V nn . We have 



Var[F ni ]=E[V; 2 ,]-(E[y ni ]) 2 . 



(14) 
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Figure 6: The sample means and the theoretical means (left display, a dotted and 
a solid line, respectively) together with the sample standard deviations and the two 
theoretical standard deviations corresponding to Theorems [T] and [5] (right display, 
a dotted, a solid and a dashed line, respectively). Here the target density / and the 
density k of a random variable Z are standard normal densities, the noise variance 
ofj = 1, the sample size n = 100000, the bandwidth h n = 0.24 and the kernel w 
is given by (0. The number of replications equals N — 500. The integral in (TTTj) 
and not its asymptotic expansion was used to evaluate the standard deviation in 
Theorem [U 



Observe that 



E[V 7 



2 -l 



h 2 

oo ""n 



W r „ 



x-y 



9n(y)dy, 



(15) 



where g n denotes the density of Xj . Integration by parts gives 



1 f 1 



W rn [U 



III 



<r uu, M*) , flL 



and hence 



\Wr n (u)\ < 



-1 



4>k(r n t) 
fi'witffairnt) - r n 4> w (t)4) k {r n t) 



{<t>k{r n t)f 



dt 



Furthermore, linin^oo r n = r < oo implies that there exists a positive num- 
ber a, such that supr n < a < oo. Notice that 



ini \4> k (r n t)\ = inf \4> k (s)\ > hxi \<f> k (s)\. 

t€[— 1,1J se[-r n ,r n \ se[—a,a\ 



Therefore 



1 



\w rn (u)\ < c ak — / (|^(t)| + \<f> w (t)\)dt, 
\ u \ J-i 



(16) 



where the constant c ak does not depend on n, but only on the density k and 
the number a. On the other hand 



l«V„(«)| ^ 



1 
2tt /_i inf 



\<t> w (t)\ 



rdt < CXD. 



_i mt sS r_ aa i \(p k (s) 



(17) 
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Figure 7: The sample means and the theoretical means (left display, a dotted and 
a solid line, respectively) together with the sample standard deviations and the two 
theoretical standard deviations corresponding to Theorems [T] and [5] (right display, 
a dotted, a solid and a dashed line, respectively). Here the target density / and the 
density k of a random variable Z are standard normal densities, the noise variance 
ofj = 4, the sample size n = 100000, the bandwidth h n = 0.44 and the kernel w 
is given by (0. The number of replications equals N — 500. The integral in (TTTj) 
and not its asymptotic expansion was used to evaluate the standard deviation in 
Theorem [U 

Combining (|16|) and (|17|) . we obtain that 



\w r „(u)\ < min Ci, 



Co 



u 



(18) 



where the constants C\ and C2 do not depend on n. Observe that the func- 
tion on the right-hand side of (|18f) is square integrable. Next, we have 



sup \g n (x-u)-f(x)\ < sup \g n (x-u)-g n (x)\ + \g n (x)- f(x)\= I + II. 
\u\<t n \u\<e n 

for an arbitrary e n > 0. By the Fourier inversion argument for I we obtain 



l/l < 



sup — / < 

\u\<e n /7r J -00 
1 roo 



-itx 



~'<i>f{t)<t>k{rnt){e Uu - l)dt 



2vr 



<7T / \M*)\ ™ P \e uu -l\dt. 



|"u|<e r , 



Atu 



1| < £n\t\ -^ for every fixed t. Furthermore, 



''" — 1| < 2 and 4>f is integrable. Let e n J. as n — ► 00. Then by the 



Notice that supi u | <£n |e 

sup| u |< e „ |e" 

dominated convergence theorem I will vanish as n — > 00. A similar Fourier 
inversion argument and another application of the dominated convergence 
theorem shows that 1/ also vanishes as n — > 00. Thus (112[) is satisfied. Now 
(fT5|) , ([18]) and Lemma Q] imply that 



Eft] 



1 
h„ 



m 



\w r (u)\ du. 



(19) 
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Figure 8: The sample means and the theoretical means (left display, a dotted and 
a solid line, respectively) together with the sample standard deviations and the two 
theoretical standard deviations corresponding to Theorems [T] and [5] (right display 
a dotted, a solid and a dashed line, respectively). Here the target density / is a 
mixture of two normal densities with means equal to — 1 and 1 and the same variance 
0.375, the mixing probability is 0.5, the density k of a random variable Z is a 
standard normal density, the noise variance a n = 4, the sample size n — 100000, the 
bandwidth h n = 0.44 and the kernel w is given by ([!]). The number of replications 
equals N = 500. The integral in (fTTj) and not its asymptotic expansion was used to 
evaluate the standard deviation in Theorem O 



Furthermore, by Fubini's theorem 



E[V T 



" A = h n 2n 
oo 

exp 

oo 

1 1 



exp 



j — < 



-itx 
K 



E 



-— ^E 

KJ 



exp 



exp 



itYi 



h,, 



itX, 



E 



<t>k(r n t) 



dt 



exp 



ZLCTyiZj j 
h n 



4>k{r n t) 



dt 



-r-TT / exp 

tin ^ ./-oo 



x) +* (i ) t*®*- 



(20) 



The last expression is bounded uniformly in h n due to Assumption A (i) and 
(hi), which can be seen by a change of the integration variable t/h n = s. 
Moreover, using (fT5l) . (fTTl) and (fT9l) . we have that 

2+<5 



ENK 



2+5 1 



"J 



1 



h 2+S 



»;,- 



x-y 
h 



9n{y)dy 



-1-5 



is of order h n . Combination of the above results now yields 



ENK 



n;i 



E[Ki]| 



2+8] 







(21) 



(22) 



n s / 2 (V&r[V nj ]y+ 5 / 2 

as h n — > 0,nh n — ► oo. Therefore f n h n (%) satisfies Lyapunov's condition for 
asymptotic norm ality in the triangular array scheme, see Theorem 7.3 in 



Billingslevl ( 19681 ). and hence it is asymptotically normal, i.e. 

fnh n ( x )- E ifnhA x )} V 



v / Var[/ n/ln (a;)] 
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AA(0,1). 



Formula (jlj) is then immediate from this fac t, formulae (1141) , (I19p , ()20p and 
Slutsky's lemma, see Corollary 2 on p. 31 of lBillingslevi ( 19681 ) . D 



Proof of Theore m® The proof f ollows the same line of thought as the proof 
of Theorem 1 in Ivan Es and Uhl ( 20051 ). For an arbitrary < e < 1 we have 



fnh n {x) 



2irnh 



1 _ 

— Y 



3=1 



+ 



2irnh 



n 

— Y 



3 = 1 



exp is 



+ 



Xj-x 



exp is 



4>w(s) 
<Pk(s/p r , 



Xj-x 



-ds 



K 



4>w(s) 

4>k{s/p n 



(23) 



-ds. (24) 



The integral in ([25]) is real-valued, which can be seen by taking its complex 
conjugate. Using Assumption B (i), the variance of (f23j) can be bounded as 
follows: 



Var 



1 



2irnh n 



E 

3=1 



exp I is 



X 3 ~x 

'til 



ds 



< 



4:7r 2 nhn 



E 



< 



< 



4-7r 2 n/i2 



(■ ( X i~ 

exp is\ — — 

V V hr. 



ds 



<t>w(s) 
4>k{s/p n 

4>k{s/Pn) 



ds 



\4>k(s/p n )\ 



1 



{2ef 



4:7r 2 nhn 
'ill 



1 



O 



it 2 n a 2 



inf. 
e 

Pn 



-£<s<e 

2-2A,) 



\<t>k{s/Pn)\ 



exp 



2e A 

PPn 



Hence the contribution of ()23l) minus its expectation is of order 



Op 



1 1 



o- n \/n\p- 



l-Ao 



exp 



PPn 



By comparing this to the normalising constant in ([5]), by Slutsky's lemma we 
see that (|23p can be neglected when considering the asymptotic normality 
of fnh n {x)- 
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The term (1241) can be written as 



1 



2Trnh„C 






+ / 1 exp Us I 



/X 7 - — x 



2i:g(/ 1 6+ /) exp G s (^ 



^(s) 



^w(s) 



-A 



cxp 



MPr 



f/.S 



1 



1 /|s|\- A o / |.s 

— I — I cxp 



PPn 



ds. 



{s/p n ) C\p n , 

Observe that both (f25j) and (f26j) are real. Expression (J25j) equals 



(25) 



(26) 



^An " 1 ^ / cos( S (^-^))^( S ) S ~ A °expf^ T ) ( i S . (27) 



By formula (21) of Ivan Es and Uhl (|2005|) 



cos s 



Xj-x 



h n J J V h 

where R n .j(s) is a remainder term satisfying 

\Rn,j\ < (\x\ + |X,'| 



cos I -4 ) +Rn,j(s), 



1-8 



(28) 



(29) 



whence by Lemma 5 of Ivan Es and Uhl (J2005I ) the expression (|27p equals 



1 /"l / A \ -i n V" i n 

^'- 1 / 0U^- Ao exp(^jd S I^cos(^) + Ij:^,, 



ira n C 



PPr, 



-^A(T(a + 1) + o(l)) (^V +a ^' i- -+a„-i 



1 ™ 
C(^n)-VcOS 

n ^^ 



i=i 



X,--s 



1 n 



3=1 



where 



^n, 



iPn 



A -l 



Rn,j(s)(p w (s)s °exp 



Wn 



(is. 



' ,3 VT(T n C f 

By (j29|) and Lemma 5 of Ivan Es and Uhl ( 20051 ) the latter expression can be 
bounded as 



\Rn,j\<— ^(N + I^IK 1 



7R7 n C 



Ao-1 



1-s 

K 



{s)s~ x ° exp 



M! 



ds 



nunhnC \X 



(T(a + 2) + O (l))^ 2 +«)+ A - 1 C(p„)(|x| + |X,|). 
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Hence 



Var[i2 nj -]<E[ig,] = 



0~, ' in 



2(A(2+q)+A -1) 



Pn 



(C(Pn)) : 



Here we used the fact that E [Y.?] + E [Z?] < oo together with the fact that 
being convergent, the sequence a n is bounded, which implies that E \Xf\ is 
bounded uniformly in n. By Chebyshev's inequality it follows that 



1 n 

— / {Rn,j 



E-Rnj) — Op 



1 



p A(2 +a)+ Ao-l c(pn) 



j=l 



@nil"n 



11 



(30) 



After multiplication of this term by the normalising factor from ([5]) we obtain 
that the resulting expression is of order p^(a n h n )~ l = h^~ 1 a^. Assumption 
B (v) and Slutsky's lemma then imply that the remainder term (|30p can be 
neglected when considering the asymptotic normality of f n h n (x). 
The variance of (I26p can be bounded by 



1 



A-K 2 nh 2 n C 2 



+ 



-A 



— cxp 



Wn 



\u{s/p n )\ds 



where the function u is given by 

C|y| A °exp(- 



u(y) 



\y\ x v~ 



<Pk(y) 



l. 



(31) 



This function is bounded on M\(— 5, 8), where 5 is an arbitrary positive 
number. It follows that u(s/p n ) is also bounded and tends to zero for all 
fixed s with |s| > e as p n — > 0. Hence the variance of (|26p is of smaller order 
compared to the variance of (|25|) . which can be shown by the dominated 
convergence theorem via an argu ment similar to the one in the proof of 
Lemma 5 of Ivan Es and Uhl ( 20051 ). Therefore by Slutsky's lemma (|26p can 
be neglected when considering asymptotic normality of ([5]). 

Combination of the above observations yields that it suffices to study 



A 



/' 



l+a 



(F(a + l)+o{l))U nhn (x), 



(32) 



where 



Unh n (x) 



1 



n 

n ^ 



cos 



A, 



E 



cos 



X, 



Observe that 



X* 



Yi 



K 



~ h~ j 

I If, 



Yj 



— + — 

Pn 
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and th at by the same arguments as in the proof of Lemma 6 in lvan Es and Uh 
( 20051 ). both (Yj—x)/h n mod 2tt and Z/p n mod 2ir converge in distribution 
to a random variable with a uniform distribution on [0, 2ir]. Furthermore, 
these two random variables are independent. Now notice that for two inde- 
pendent random variables W\ and Wi the sum W\ + Wi mod 2tt equals in 
distribution (Wi mod 2k + W2 mod 2tt) mod 2tt. Moreover, if W\ and Wi 
are uniformly distributed on [0, 27r] , then also W\ + W2 mod 2tt is uniformly 
distributed on [0, 2ir], see IScheinokl (| 19651 ). Using the se two facts, by e xactly 
the same arguments as in the proof of Lemma 6 of Ivan Es and Uhl ( 20051 ) 

we finally obtain that U n h n (x) — » J\f (0, 1/2) . The latter in conjunction with 
entails ©. D 



Proof of Theore m [31 The p roof employs an approach similar to the proof of 
Theorem 2.1 of|ian| (jl991bl ). We have 



E[V r 



2.1 

njl 



h 2 

00 lu n 



W 



Pn 



"■'71. 



9n(y)dy. 



By equation (3.1) of lFanl ( 1991b! ) (with h n replaced by p n ) we have 



Pn4>w(t) 



<f>k(t/Pn) 



< w Q (t), 



where wo is a positive integrable function. Hence by the dominated conver- 
gence theorem 

1 



PnWpAv) 



2irC 



- itx t p <l) w {t)dt. 



Furthermore, again by equation (3.1) of lFanl ( 1991b! ) we have \pnW Pn (y )\ < 
C'2 for some constant Co independent of n and y. while equation (2.7) of Fan 
( 1991bl ) implies that \pnW Pn (y)\ < C\/\y\. Combination of these two bounds 
gives 

\(£w Pn (y)\< mm (^±,C 2 y (33) 



Since the fact that g n satisfies (|12p can be shown exactly as in the proof of 
Theorem [TJ by Lemma [T] we then obtain that 



vK 



nji 



h n p 



23 



f(x) 



1 



2vrC 

1 



n 2 



- ity tP<f) w {t)dt 



Kpf 2vrC72 J_ x 



dy 



(34) 



\t\^\4> w {t)\ 2 dt, 
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where the last equality follows from Parseval's identity. Furthermore, by 
Fubini's theorem and the dominated convergence theorem we have 



1 1 

h n 2tt 
1 



exp 



-oo 
oo 






exp 



'1 r> 



(f>k(t/Pn 



-dt 



-itx/h 



<j) f 



t 



<j>w(t)dt 



(35) 



-itx 



(j>f(t)(j> w (h n t)dt 



The dominated convergence theorem is applicable because of Assumption B 
(i) and (iii). Finally, let us consider E[|V^~ |]. Writing 



V[\V nj \*+ s ] 



h 2+5 
— oo (tn 



W, 



x-y 
h n 



2+8 



g n (y)dy, 



(36) 



and using (|33|) and Lemma 1, we obtain that 



V[\V nj \ 2+S ] 



o{h-^p-j^: 



Combination of (|34p . (|35|) and (|36p yields that Lyapunov's condition is ful- 
filled and hence that f n h n {x) is asymptotically normal. Formula ([6]) then 
follows from (|34h and (|35l) . This completes the proof. D 
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